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ABSTRACT 


Several  segmentation  techniques  were  applied 
to  a set  of  51  FLIR  (Forward-Looking  InfraRed) 
images  of  four  different  types,  and  tba  results 
were  compared  to  hand  segmentations.  There  were 
substantial  differences  in  performance,  indicating 
that  the  choice  of  proper  technique  is  very  impor- 
tant. The  segmentation  techniques  used  were 
"superslice",  "pyramid  spot  detection",  two  ver- 
sions of  "relaxation",  "pyramid  linking",  and 
"superspike",  One  technique,  "superspike",  out- 
performed all  the  others,  detecting  88%  of  the 
targets  and  yielding  only  1.6  false  alarms  per 
true  target. 


1.  Introduction 


Object  detection  in  infrared  images  is  a 
problem  of  considerable  practical  interest  [1]. 
Numerous  techniques  have  been  developed  for  the 
primary  purpose  of  segmenting  FLIR  (=Forward 
Looking  InfraRed)  images  into  objects  and  back- 
ground (e.g.,  [1,2]);  in  particular,  [3]  is  a 
survey  of  such  techniques,  arid  [4]  describes  a 
comparativestudy.  This  paper  summarizes  the 
results  of  another  comparative  study;  further 
details  about  the  study  can  be  found  in  [5], 

Section  2 describes  the  segmentation  tech- 
niques that  were  tested;  Section  3 describes  the 
evaluation  procedure;  and  Section  4 summarizes  the 
results  of  the  study. 

2 . Segmentation  techniques 


components  of  above-threshold  points  are  extracted. 
The  gray  level  gradient  is  also  measured  for  the 
image,  and  points  at  which  it  is  a local  maximum 
are  determined.  A component  is  selected  as  a 
possible  object  if  many  gradient  maxima  coincide 
with  its  border  and  surround  it. 

2.2  Pyramid  spot  detection  [7] 

This  technique  is  designed  to  extract  compact 
objects  of  arbitrary  size  from  an  image;  it  too 
performed  well  in  earlier  studies.  We  build  an 
exponentially  tapering  "pyramid"  of  reduced- 
resolution  versions  of  the  image  by  successive 
block  averaging,  e.g.,  using  nonoverlapping  2x2 
blocks,  or  4x4  blocks  with  50%  overlap  in  each 
direction,  so  that  each  image  is  half  the  size 
(!S  the  area)  of  the  preceding.  At  each  level  of 
the  pyramid,  we  apply  a standard  spot-detection 
operator  - e.g.,  we  compare  each  pixel  to  its 
eight  neighbors,  and  judge  a spot  to  be  present  if 
they  differ  sufficiently.  A spot  that  detected 
in  this  way  should  correspond  to  a compact  object 
on  a contrasting  background  in  the  original  image. 
For  each  such  spot,  we  consider  the  portion  of  the 
original  image  corresponding  to  the  pixel  and  its 
neighbors,  and  apply  a threshold  to  this  portion, 
chosen  midway  between  the  gray  level  of  the  pixel 
(an  average  of  a block  of  gray  levels  in  the 
original  image)  and  the  average  gray  level  of  its 
neighbors  (an  average  of  block  averages) . This 
thresholding  generally  extracts  the  object  that 
gave  rise  to  the  spot  detection. 

2.3  Relaxation  [8] 


The  techniques  tested  are  briefly  described  in 
the  following  paragraphs;  for  further  details  see 
the  cited  references. 

2.1  Superslice  [6] 

This  technique  was  quite  successful  in 
earlier  studies  of  FLIR  object  detection  [1],  A 
set  of  gray  level  thresholds  is  applied  to  the 
given  image,  and  for  each  threshold,  connected 
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"Relaxation"  methods  of  object  extraction 
have  been  extensively  studied.  The  basic  approach 
is  to  initially  assign  "object"  and  "background" 
probabilities  to  each  pixel,  based  on  their  dis- 
tances from  the  ends  of  the  grayscale.  The  pro- 
babilities are  then  iteratively  adjusted  based  on 
the  probabilities  of  the  neighboring  pixels,  with 
like  reinforcing'  like.  When  this  is  done,  the 
probabilities  tend  to  converge  to  relative  cer- 
tainty ((0,1)  or  (1,0)),  and  yield  a good  segmen- 
tation of  the  image  into  objects  and  background. 

An  alternative,  also  investigated,  used  three 
rather  than  two  classes,  assigning  initial  proba- 
bilities based  on  distances  from  the  ends  and  mid- 
point of  the  grayscale;  thus  the  pixels  were  not 
forced  to  choose  between  "target"  and  "background", 
but  also  had  a third  option  ("clutter"). 
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2.4  Pyramid  linking  [9] 

This  is  a method  of  segmenting  an  image  based 
on  creating  links  between  pixels  at  successive 
levels  of  a pyramid".  We  build  the  pyramid  using 
overlapping  4x4  blocks;  thus  each  pixel  has  16 

sons  (on  the  level  below)  that  contribute  to  its 
average,  and  four  "fathers"  (on  the  level  above) 
to  whose  average  it  contributes.  We  now  link  each 
pixel  to  the  father  whose  value  (=average)  is 
closest  to  its  own.  We  then  recompute  the  aver- 
ages, allowing  only  those  sons  that  are  linked  to 
a pixel  to  contribute  to  its  average.  We  now 
change  the  links  based  on  these  new  averages,  then 
recompute  the  averages  again,  and  so  on.  This 
process  stabilizes  after  a few  iterations;  at  this 
stage  the  links  define  subtrees  of  the  pyramid, 
rooted  at  the  top  level,  which  we  take  to  be  2x2 
so  that  there  are  (at  most)  four  trees.  The  sets 
of  leaves  of  these  trees  (pixels  in  the  original 
image)  thus  define  a segmentation  of  the  original 
image  into  at  most  four  subsets. 

2.5  "Superspike"  [10] 

This  is  a method  of  image  smoothing  based  on 
iterated  selective  local  averaging.  Each  pixel  is 
averaged  with  those  of  its  neighbors  that  satisfy 
the  following  criteria,  based  on  the  image's 
histogram: 

a)  The  neighbor  is  more  probable  than  the 
pixel,  i.e.,  its  gray  level  has  a higher 
value  in  the  histogram. 

b)  The  histogram  has  no  concavity  between  the 
gray  levels  of  the  pixel  and  the  neighbor 
(as  would  be  the  case  if  they  belonged  to 
two  different  peaks,  or  to  a peak  and 
shoulder)  . 

When  this  process  is  iterated  a few  times,  the 
histogram  generally  turns  into  a small  ,et  of 
spikes.  The  image  can  then  be  segmentec  by  map- 
ping them  into  nearby  taller  ones,  until  only  five 
spikes  remained,  thus  segmenting  the  image  into 
five  subsets.  The  choice  of  five  classes  was  an 
arbitrary  one,  based  on  preliminary  experiments 
in  which  it  was  found  that  using  fewer  classes 
tended  to  merge  some  objects  into  the  background. 

3.  Me thodology 

The  overall  approach  used  in  the  comparative 
study  was  as  follows: 

1)  Each  technique  being  tested  (Section  2)  was 
applied  to  the  given  set  of  images,  yield- 
ing a classification  of  each  image  into 
subsets.  Connected  component  labelling 
was  performed  on  the  resulting  classified 
images,  yielding  a set  of  regions. 

2)  Regions  that  were  too  large,  too  small,  or 

too  elongated  to  be  targets  were  eliminated. 

In  our  main  study,  the  criteria  for  accept- 

ability  ye  re 
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In  addition,  regions  having  the  wrong 
polarity  relative  to  the  mean  image  gray 
level  were  eliminated. 

3)  For  each  surviving  region,  the  coordinates 
of  its  centroid  and  the  dimensions  of  its 
upright  circumscribing  rectangle  were 
computed.  The  centroids  and  circum- 
rec tangles  of  the  true  targets  were  also 
known  (from  ground  truth  information  and 
hand  segmentation).  A target  was  said  to 
have  been  detected  if  the  x and  y dis- 
placements between  a region  centroid  and  a 
true  target  centroid  were  at  most  half  the 
true  target's  rectangle  dimensions. 

Region  centroids  not  satisfying  these  con- 
ditions were  considered  to  be  false  alarms. 
The  segmentation  accuracy"  for  each  de- 
tected target  was  measured  by  the  fraction 
of  overlap  between  the  circumrectangle  of 
the  detected  region  and  that  of  the  true 
target.  'Extra  detections"  were  said  to 
occur  when  more  than  one  region  centroid 
occurred  in  the  inner  half  of  a true 
target's  rectangle;  all  such  detections 
were  counted  in  computing  the  average  seg- 
mentation accuracy.  These  methods  of 
evaluating  a segmentation  were  proposed  in 


4.  Experiments 

In  a pilot  study,  all  six  techniques  (including 
both  two-class  and  three-class  relaxation)  were 
applied  to  three  image  samples  (see  Figure  1). 

llgUJe  m als°  shows  the  resulting  segmented  images. 

see  that  the  pyramid  spot  technique  did  not 
perform  very  well.  This  is  not  too  surprising, 
since  this  technique  was  designed  for  the  extrac- 
tion of  isolated  objects  on  a contrasting  back- 
ground. Results  with  the  relaxation,  pyramid 
linking,  and  superspike  techniques  looked  more 
promising,  and  it  was  therefore  decided  to  use  all 
of  them  in  the  main  study.  The  superslice  tech- 
nique was  not  used  in  the  main  study  because  of  its 
comparatively  high  computational  cost,  which  made 
its  use  relatively  impractical. 

The  main  study  used  a set  of  51  FLIR  images 
supplied  by  Westinghouse  Systems  Development 

n an  ss  7mfr0m  NaVy  (N°2 * * S-  2_10)’  Army  <Nos- 
fr'30’  5oT7*  ’ 3nd  Alr  F°rce  (Nos>  31~36)  sources 
figure  2).*  All  images  are  128x128;  Nos.  11-30 
were  obtained  from  64x64  images  by  horizontal 
and  vertical  reflection,  in  order  to  present  the 
targets  in  four  orientations. 


t mrormation  about  the  data  base  can  be 

° ^"ednfro”  Mr-  Bruce  Schacter,  Westinghouse 
-y  terns  Development  Division,  Baltimore,  MD  21203. 
The  target  types  and  locations  are  listed  in 
Table  1. 


Tht  four  selected  techniques  (two-  and  three- 
class  relaxation,  pyramid  linking,  and  superspike) 
were  applied  to  these  images.  (In  the  case  of 
images  21-30,  they  were  applied  to  only  one  quad- 
rant, since  the  methods  are  essentially  orienta- 
tion-invariant; the  scores  (detections  and  false 
alarms)  obtained  in  this  way  were  multiplied  by 

4.)  The  pyramid  linking  algorithm  was  designed 
for  64x64  images;*  in  order  to  apply  it  to  images 
2-10,  31-36,  and  55-70,  they  were  resampled  down 
to  that  size,  and  the  outputs  (centroids  and 
rectangles)  were  scaled  up  in  order  to  compare 
them  with  the  ground  truth. 

Figure  3 shows  the  segmentation  results  using 
the  four  methods  for  each  of  the  51  images.  Table 
2 summarizes,  by  image  class,  the  number  of  targets 
present,  the  number  correctly  detected,  the  number 
of  extra  detections,  the  number  of  false  alarms, 
and  the  segmentation  accuracy.  Detailed  results 
for  the  51  individual  images  are  given  in  [5] . 

We  see  from  these  results  that  segmentation 
accuracy  does  not  vary  greatly  among  the  methods; 
it  ranges  between  about  .5  and  .8  in  all  cases. 

Extra  detections  are  also  not  a significant  factor, 
except  perhaps  for  the  pyramid  linking  and  super- 
spike methods  as  applied  to  the  NVL  data  (images 
11-30).  As  regards  correct  detections  and  false 
alarms,  3-class  relaxation  and  superspike  were  the 
best  methods  (though  no  method  was  very  good)  for 
the  Navy  images;  pyramid  linking  and  superspike  had 
good  detection  rates  for  the  NVL  data,  but  the 
former  had  a much  higher  false  alarm  rate;  and 
superspike  was  by  far  the  best  method  for  the  Air 
Force  and  NVL  flight  test  images,  making  it  the 
best  method  overall.  It  detected  111  of  the  126 
targets  (over  88%)  with  only  26  extra  detections 
and  202  false  alarms  (about  1.6  per  true  target), 
and  its  segmentation  accuracy  was  a reasonable  0.66. 
The  next  best  method,  pyramid  linking  (which,  it 
should  be  recalled,  was  applied  to  half-resolution 
versions  of  images  nos.  2-10,  31-36,  and  55-70), 
detected  only  63%  of  the  targets  and  had  many  more 
false  alarms  (over  5 per  target).  For  further 
details  see  [5]. 

5.  Concluding  remarks 

The  results  of  the  main  study  show  that  one 
method,  "superspike",  performed  substantially 
better  on  the  Westinghouse  data  base  than  the  other 
methods  tested.  It  detected  nearly  90%  of  the  true 
targets  and  gave  only  1.6  false  alarms  per  target. 
Note  that  these  results  were  obtained  using  seg- 
mentation alone,  in  conjunction  with  very  crude 
size  and  height :width  criteria.  If  the  segmenta- 
tion step  were  followed  by  a classification  al- 
gorithm, such  better  performance  could  be  expected. 

Some  further  improvement  in  performance 
can  undoubtedly  be  obtained  by  further 


refining  the  segmentation  process.  However,  there 
are  limits  to  what  can  be  achieved  in  this  way  by 
algorithms  that  incorporate  so  little  knowledge 
about  the  nature  of  the  targets.  In  order  to  attain 
a significantly  higher  level  of  performance,  it 
will  probably  be  necessary  to  develop  a knowledge- 
driven  system  capable  of  some  degree  of  reasoning 
about  the  regions  extracted  by  the  initial  segmen- 
tation. 
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Figure  1.  Results  of  pilot  study  (three  examples) 
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Ground  truth  for  the  51  images.  (Cj ,Cj) =centroid  coordi- 
nates; (Rj , Rj) =hal f-dimensions  of  circumrectangle . In 
images  2-30,  high  gray  levels  are  hot;  in  images  31-36 
and  55-70,  low  gray  levels  are  hot. 
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83.5 

85.5 

56 

84 

85 

57 

91 

93.5 

58 

102 

103.5 

59 

96 

98.5 

60 

86 

88.5 

61 

76.5 

78 

62 

90 

94 

63 

84.5 

84 

64 

85 

84.5 

65 

93.5 

95.5 

66 

102.5 

103 

67 

100 

102 

68 

90 

91.5 

69 

78.5 

79.5 

70 

96.5 

97.5 

71.5 

7.5 

59.6 

12.5 

75 

11 

63.5 

13.5 

70 

or 

62.5 

14 

81.5 

3.5 

105.5 

3 

78 

3.5 

106 . 5 

2.5 

74.5 

3.5 

106 . 5 

3 

72.5 

4.5 

108 

3 

68 . 5 

4.5 

110 

4 

52.5 

5.5 

103 

5 

14.5 

7 

76 

6.5 

18.5 

11.5 

98.5 

8.5 

4 

2 

19.5 

2.5 

5 

2.5 

25.5 

3 

9.5 

3.5 

31.5 

2 

14.5 

4 

40 

2.5 

24 

4.5 

53 

3.5 

24 

5.5 

61 

3 

12 

7 

56.5 

4 

54.5 

9.5 

118 

4 

Table  1, 

cont ' d. 

14 

tank 

17 

tank 

22.5 

tank 

19 

tank 

15.5 

tank 

7 

tank 

6 

APC 

7.5 

tank 

5 

APC 

8 

tank 

5 

APC 

9 

tank 

5.5 

APC 

11 

tank 

7.5 

APC 

13 

tank 

8.5 

APC 

14.5 

tank 

11.5 

APC 

18.5 

tank 

15 

APC 

4 

truck 

4 

jeep 

5 

truck 

A 

jeep 

1 

truck 

4 

jeep 

8 

jeep 

4 . 5 

truck 

10.5 

truck 

5.5 

jeep 

11.5 

truck 

6.5 

jeep 

12 

truck 

8 

jeep 

19 

truck 

10.5 

jeep 
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Correctly 

Extra 

False 

Seqmentation 

Xmaqes 

Tarqets 

Method 

detected 

detections 

alarms 

accuracy 

2-10 

8 

2-class  relaxation 

0 

0 

43 

- 

(Navy, 

3-class  relaxation 

2 

0 

67 

0.70 

China 

Pyramid  linking 

0 

0 

145 

Lake) 

Superspike 

3 

0 

77 

0.51 

11-30 

80 

40 

0 

92 

0.73 

(NVL 

20 

8 

92 

0.49 

data) 

72 

32 

392 

0.67 

76 

24 

60 

0.64 

31-36 

6 

2 

0 

9 

0.74 

(Air 

3 

1 

27 

0.73 

Force, 

3 

2 

100 

0.57 

TASVAL) 

6 

1 

63 

0.60 

55-70 

32 

2 

C 

6 

0.67 

(NVL 

13 

1 

19 

0.65 

flight 

4 

0 

38 

0.80 

test) 

26 

1 

2 

0.73 

Overall 

126 

44 

0 

150 

0.73 

38 

10 

205 

0.58 

79 

34 

675 

0.68 

111 

26 

202 

0.66 

Table  2.  Summary  of  results  by  image  class. 


